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Abstract 

We consider a worst-case asymmetric distributed source coding problem where an information sink communicates with TV 
correlated information sources to gather their data. A data-vector x — (xi, . . . ,Xj?) ~ V is derived from a discrete and finite 
joint probability distribution V = p(xi, ■ ■ ■ , sjv) and component x% is revealed to the i source, 1 < i < N. We consider an 
asymmetric communication scenario where only the sink is assumed to know distribution V ■ We are interested in computing the 
minimum number of bits that the sources must send, in the worst-case, to enable the sink to losslessly learn any x revealed to 
the sources. 

We propose a novel information measure called information ambiguity to perform the worst-case information-theoretic analysis 
and prove its various properties. Then, we provide interactive communication protocols to solve the above problem in two different 
communication scenarios. We also investigate the role of block-coding in the worst-case analysis of distributed compression 
problem and prove that it offers almost no compression advantage compared to the scenarios where this problem is addressed, as 
in this paper, with only a single instance of data-vector. 

Index Terms 

Distributed Compression; Interactive Communication; Wireless Sensor Networks; Generalized Information Theory; In- 
formation Measures 



I. Introduction 

It is more than sixty years since Claude Shannon proposed his formulation of Information Theory |l). During intervening 
years, the information theory has found relevance in disciplines as diverse as Communication Theory, Theory of computation, 
Physics, Neural Information Processing Systems, Statistical inference and learning, and Control Theory, to name a few. Although 
the origins of Shannon information lie in the search for solutions to specific compression and communication problems, 
Shannon's information measure found itself being used in attempts to solve almost all compression and communication 
problems. Applications of the information theory in complex communication scenarios in diverse disciplines lead to only 
a few instances of successful applications El. 

For the information theory to fulfill its promise as a systems notion, it must be able to produce successful results in 
diverse communication systems composed of heterogeneous communicating agents. Taking a cue from Shannon's solution to 
his original problem to understand conditions under which error free transmission of messages can take place between an 
information source and sink, we attempt to pose problems in communication scenarios as attempts to reach a set of design 
objectives the system must satisfy. In doing so, we come across useful recurring quantities that look like information measures, 
as in Shannon's source and channel coding theorems. 

Posing the problem of information transfer in a communication system as a question of whether a set of design specifications 
can be met also allows us to characterize arbitrarily complex communication scenarios in terms of very general optimization 
problems that include various system-level constraints, such as energy, computation and communication resources and delay- 
tolerance; and system characteristics, such as interaction among constituent agents and lack of global-knowledge of agents. 
Real systems often operate under such constraints and possess such characteristics. However, current information-theoretic 
approaches and methods often ignore such system constraints and characteristics while attempting to address more realistic 
models of real-world systems. This disregard for such system-level details while still attempting to use the classical (Shannon) 
information-theoretic results in various communication scenarios is, according to us, the primary reason for the apparent failure 
of the information theory in making meaningful contributions to various disciplines. 

We attempt to find systematic and principled generalizations of the information theory which take into account the various 
resource constraints and general characteristics mentioned above. We argue that such generalizations are essential to analyze 
realistic models of real world compression and communication systems of diverse kinds and may result in new definitions 
of information, novel measures to quantify information transfer, and new variants of classical information-theoretic problems 
to model wider class of real systems. Recently, the need for such generalizations has been realized, resulting in some new 
approaches in this direction, [3|-[5|. 

We concern ourselves with one such generalization of the classical information-theoretic problem of Distributed Source 
Coding (DSC). We first propose a new canonical scheme to classify numerous variants of the classical DSC problem. Then 
we consider one such variant and introduce a new information measure to aid in the analysis of this variant. 
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Fig. 1 . Slepian-Wolf Distributed Source Coding problem. 



A. Distributed Source Coding 

Shannon's source coding theorem states that if an information source observes a random variable X ~ p(x), then it requires 
to send at least H(X) bits, on average, so that the information sink can losslessly recover X, [6|. Generalizing it for a pair 
random variables (X,Y), it states that if an information source observes a pair of random variables (X,Y) ~ V = p(x,y) 
and jointly encodes those, then H(X, Y) bits from the information source, on average, are sufficient for the information sink 
to losslessly recover (X, Y), as in Figure [I] 

However, the authors in [7] proved a surprising result that states if two correlated random variables (X,Y) ~ V =p(x,y) 
are observed by two non-cooperating information sources, and X and Y are independently encoded, then as long as the 
sources send a total of H(X, Y) bits, on average, it is still sufficient for the sink to losslessly recover (X, Y), with individual 
information rates for X and Y being at least H(X\Y) and H(Y\X), respectively, as in Figure [T] 

Fundamentally, the information transfer in any communication system where a sink node is interested in collecting information 
from a set of correlated information sources, which do not communicate among themselves, can be modeled as distributed source 
coding problem. Communication system in this context can be a model of communication in neural information processing 
system, learning and estimation system, or wireless communication system. However, the set of constraints and characteristics 
of the particular communication system and objectives of communication in the system often determine the corresponding 
variant of distributed source coding problem that is most appropriate to model the system in question. The resultant variants 
differ from each other not only in terms of the problem definition, but also in terms of the computation and communication 
complexities of their optimal solutions. 

Since the publication of the seminal paper of Slepian and Wolf, various attempts have been made towards solving distributed 
source coding problem, such as ]8|-fTT) and the references therein. However, there is no single definition of DSC problem. 
In the absence of any unified framework to systematically generate and address the different variants of distributed source 
coding problem, it is often difficult to compare and reconcile the approaches and solutions of different variants. We propose a 
canonical framework to construct and address such different variants. The proposed framework classifies each DSC problem 
variant according to subset of assumptions and objectives used to define the problem variant. The sets of particular assumptions 
and objectives we consider are as follows. 

Assumptions: 

• Symmetric or asymmetric communication corresponding to presence or absence of global knowledge at the information 
sources, respectively. 

• Interactive (with limited or unlimited number of messages) or non-interactive communication between the sink and the 
sources. 

• Serial or parallel communication from the sources to the sink. 

• Block-encoding of data-samples at the sources. 
Objectives: 

• Lossy or lossless data-gathering at the sink. 
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Fig. 2. A canonical scheme to construct variants of Distributed Source Coding problem. The variant of DSC problem considered in this paper is constructed 
by selecting the dashed boxes in each block. 

• Worst-case or average-case performance analysis. 

• Function of number of bits communicated by a subset of communicating agents (the sink and/or the sources) that is to 
be minimized. Examples of such functions frequently used in defining DSC problem variants are sum and max. 

• The subset of communicating agents over which such minimizations is carried out. 

This is explained pictorially in Figure [2] For example, the variant of DSC problem considered by Slepian and Wolf in (7| 
is defined by the assumptions of symmetric communication and block-coding with asymptotically large block-lengths and has 
the objective of lossless data-gathering at sink, average-case performance analysis, minimizing the sum of source bits. The 
variant of DSC problem considered in this paper is constructed by selecting the dashed boxes in each block. 

It should be noted that we do not suggest that ours is the only way to classify the variants of distributed source coding 
problem, as one can come-up with alternative classification schemes with fewer or more parameters, such as the canonical 
classification scheme proposed in [ 12] that is contained in our scheme. However, we do suggest that after almost forty years 
since seminal Slepian- Wolf paper, it is the time when we should systematically address DSC problem by following a unified 
scheme to classify its variants and construct practical solutions achieving optimal performance. To the best of our knowledge, 
ours is the first such scheme. 

B. Motivation 

Our motivation to consider distributed source coding problem, in particular a variant of it, comes from its strong connection 
to the problem of maximizing the worst-case operational lifetime of data-gathering wireless sensor networks [13], where 
the base-station collects correlated sensor data. Sensor nodes constituting such networks are assumed to have limited battery 
resources that cannot be replenished either due to sensor nodes being deployed in inaccessible locations or due to high cost 
of retrieving sensor nodes and changing their batteries. This makes it impossible to replace the dead nodes. However such 
networks, once deployed, are expected to have large operational lifetimes. In such a scenario, a key challenge is to develop 
system-level strategies to efficiently utilize finite energy resources to prolong network lifetime. 

Sensor nodes expend energy in sensing/actuating, computation, and communication. However, the harsh nature of wireless 
links determines the energy cost of communication that has the potential to be a major bottleneck. One of the significant factors 
that determines the communication energy expenditure at nodes is the number of bits exchanged in the network for successful 
data-gathering. Therefore any scheme that reduces the number of such bits can make significant contribution to enhance the 
network lifetime. 

In a typical data-gathering sensor network, there are two fundamental asymmetries: resource asymmetry and information 
asymmetry. In such networks, it is reasonable to assume that the base-station has large energy, computation, and communication 



resources, whereas sensor nodes are resource limited {resource asymmetry). Further, in sensor networks, the sensor nodes 
hold the actual sampled information and the base-station may only know the general characteristics such as joint probability 
distribution of sensor data {information asymmetry). We propose that the resource asymmetry in wireless sensor networks 
should be exploited to reduce the information asymmetry in such networks. Therefore we argue that more resourceful and 
knowledgeable base-station that wants to gather sensor-data, should bear most of the burden of computation and communication 
in the network. Allowing interactive communication between the base-station and sensor nodes enables us to do this: base- 
station forms and communicates efficient queries to the sensor nodes, which they respond to with short and easily computable 
messages. 

Given the correlated nature of sensor data, the data-gathering problem in wireless sensor networks can be modeled in terms 
of well-known Distributed Source Coding (DSC) problem [7] or its variants. In the recent past, there have been several such 
attempts, such as JHJ, fTTj and various other schemes surveyed in j9j. However, most of these schemes are not really effective 
in the data-gathering wireless sensor network due to various reasons. First, in some scenarios we are interested in minimizing 
the worst-case number of sensor bits (or equivalently in maximizing the worst-case network lifetime). However, the schemes 
based on average-case information-theoretic analysis cannot be used in such scenarios. Second, such schemes cannot address 
various operational constraints and requirements of sensor network, such as non-availability of global knowledge of the system, 
such as joint distribution of sensor data, at the sensor nodes and low-latency operation. Third, these schemes do not attempt 
to exploit various opportunities, such as resource asymmetry, in such networks to reduce the computation and communication 
burden at nodes. Therefore to address various aforementioned shortcomings of existing distributed source coding schemes in 
the context of data-gathering wireless sensor networks, we introduce a new variant of the classical distributed source coding 
problem as follows. The proposed variant attempts to optimally utilize the resource asymmetry in a data-gathering wireless 
sensor network to minimize the information asymmetry in the network. 

Consider a distributed information-gathering scenario, where a sink collects the information from N correlated information 
sources. The correlation in the sources' data is modeled by joint distribution V, which is known only to the sink. The sink and 
sources can interactively communicate with each other with communication proceeding in rounds. We are primarily concerned 
with minimizing the number of bits that the sources send, in the worst-case, for successful data-gathering at the sink, but we 
are also interested in minimizing both the number of communication rounds and the number of sink bits. 

Our work mainly differs from the previous work on distributed source coding and its applications to sensor networks as 
follows. Firstly, we assume asymmetric communication where only the sink knows the correlation structure of sources' data. 
This is in contrast to existing DSC schemes that assume that all nodes know the correlation structure. Secondly, unlike existing 
DSC schemes that perform average-case information-theoretic analysis, we are concerned with the worst-case performance 
analysis of distributed source coding. As the average-case information measure of entropy or its variants cannot be used for 
the worst-case information-theoretic analysis, we introduce information ambiguity - a new information measure for worst-case 
information-theoretic analysis. Thirdly, we are interested in distributed compression when only a single instance of data is 
available at every information source {oneshot compression) unlike majority of current DSC schemes that derive their results 
in the regime of infinite block-lengths. Finally, we consider a more powerful model of communication where the sink and 
sources interactively communicate with each other. 

Note on the terminology: We consider communication system consisting of communicating agents of two types: information 
sources and information sink. We address information sources also as informants and source nodes. Similarly, we also address 
the information sink interchangeably as the receiver and the recipient. 

C. Organization 

The paper is organized as follows. In Section [U] we survey the related work. Section III introduces the notion of information 



ambiguity for the worst-case information-theoretic analyses, discusses some of its properties, and proves that it is a valid 
information measure. In Section [V] we provide precise description of the communication model we assume and formally 



introduce the distributed data-gathering problem we address in this paper. Then, Section VI provides the solutions of this 
problem under two different communication scenarios. We first present an interactive communication protocol to optimally 
minimize the number of informant bits required in the worst-case to solve the problem. Later, we provide an optimal interactive 
parallel communication protocol that efficiently trades-off the number of informant bits to reduce the number of communication 
rounds and the number of sink bits. Section [VTI| investigates the role of block-coding in the worst-case analysis of distributed 
source coding problem and proves that unlike the average-case performance, worst-case performance of DSC problems derive 
almost no advantage from the block-coding compared to oneshot compression. Finally, we conclude and discuss some future 
work in Section IVIIII 

II. Related work 

The Slepian-Wolf solution J7J of distributed source coding problem, though fundamental, is essentially existential and 
non-constructive, like many other results of the classical information theory. Though it establishes the lower bounds on the 



information rates, it does not provide us the optimal source codes or any computationally efficient method of constructing those. 
Therefore, in the recent past, numerous attempts have been made to provide practical solutions for it. In fl4) , the authors came 
up with the DISCUS framework to give practical, though not necessarily optimal, method to construct source codes. Though the 
result for the duality between Slepian-Wolf encoding and multiple-access channel was already well-known [6|, the connection 
that this piece of work made between distributed source coding and channel coding, motivated the researchers to use various 
channel codes, such as Turbo codes (T5j-|[l7|, LDPC codes (T8j-|j2TJ, and Convolution codes [22] , to solve the distributed 



source coding problem. In |23| and related papers, Zhao and Effros have addressed the lossless and near-lossless source code 
design and construction problem. Also, given the asymmetry in the available energy and computational resources between the 
base-station and the sensor nodes, |24] argues to use such asymmetric channel codes to reduce the energy consumption at the 
sensor nodes. A survey in |9l and the references therein provide more details about some of these research efforts. 

These developments, though pragmatic and constructive, are not very practical in the context of sensor networks, particularly 
due to their assumption of symmetric communication scenarios, where all nodes in the network are assumed to know the joint 
probability distribution of sensor-data, and requirement of large coding dimensions, where a large number of independent 
and identically distributed (i.i.d) samples are drawn and each informant encodes the sequence of these samples as a single 
codeword to achieve the optimal performance. Given the limited communication and computation capabilities of the sensor 
nodes, it is neither reasonable to assume that the sensor nodes know the joint distribution of all sensors' data, nor to assume 
that sensor nodes can carry out high-complexity encoding. Also, the block-encoding with very large block-lengths (typically, 
~ 10 4 data samples) required by these schemes may incur large data-gathering delays, rendering these solutions inefficient, 
given the time-criticality of sensor-data. 

The notion of interactive communication in addressing distributed source coding problem was introduced in flO| . Later 
in (8), authors attempted to deploy sink based feedback to construct practical schemes to address data-gathering problem in 
the data-gathering wireless sensor networks. However, with just a single feedback message, this work could not make use 
of the full potential of interactive communication in realizing optimum DSC performance in the sensor networks. Adler in 
ijTTJ is concerned with analysing the performance of distributed source coding problem in a scenario where there is only a 
single instance of informant-data and the sink and informants communicate interactively, however like all previous work on 
distributed source coding, this work also performs only average-case analysis. 

The notion of information ambiguity that we propose as the worst-case equivalent of the notion of information entropy, was 
introduced by Orlitsky in [25 1, but in a different context than ours. Also, the researchers in the field of "Possibility Theory" have 
endeavored to define some information measures, which are closely related to the notion of information ambiguity. However, 
it is beyond the scope of this paper to discuss those efforts and an interested reader can find the broad survey of such work 
in (3), (26). 

III. Information Ambiguity 

The original Shannon's theorems and all subsequent theorems in information theory are all asymptotic results based on the 
Large Deviations Theory. It implies the need to have very large set of data samples and leads to what we call average-case 
results. Worst-case analysis deals with sparse data gathering situations and is the sole focus of this work. Leaving the precise 
implementation motivations for its definition for the later sections, here we define a new information measure which we call 
Information Ambiguity, show that it is a valid information measure, and characterize some of its properties useful for our later 
results. 

We begin by introducing the notion of information ambiguity for two random variables and then provide its exposition for 
arbitrary number of variables. Note that throughout the paper all the logarithms are to base two. 

A. Ambiguity: Two Random Variables 

Consider a pair of random variables (Xi,X 2 ) ~ V = p{x\, x 2 ), -Xi € X and X2 € X, where X is discrete and finite 
alphabet of sizeQ|A"| and V is the joint probability distribution of (X\,X2). The support set of {X\,X2) is defined as: 

def 

Sxi,x 3 = {(x 1 ,x 2 )\p{x 1 ,x 2 ) > 0} (1) 

We also call Sx ± ,x 2 as me ambiguity set of (Xi, X 2 ). The cardinality of Sx x .x 2 is called joint ambiguity or simply ambiguity 
of (X1.X2) and denoted as fix ± ,x 2 = |-9xi,x 2 |. The minimum number of bits required to describe all elements in Sx 1 .x 2 is 

riog/ix^Xal- 

The support set of X\, is set 

def 

S Xl = {xi : for some x 2 , {x 1: x 2 ) € S Xl ,x 2 }, (2) 



'In general, X± £ X\ and X2 6 X2, where X\ and X\ are discrete alphabet sets, with possibly different cardinalities. However, to keep the discussion 
simple, we assume henceforth that all the random variables take the values from the same discrete alphabet X. 



of all possible X\ values. We also call Sx x ambiguity set of Xi. The ambiguity of X\ is defined as [ix x — I-Sxj- The 
ambiguity set and the corresponding ambiguity of random variable X2 is similarly defined. 
The conditional ambiguity set of X\, when random variable X 2 takes the value x 2 € Sx 2 is 

^Xi |x 2 (»2) = {xi : (xi,x 2 ) € S Xl ,x 2 }, (3) 
the set of possible X\ values when X 2 = x%. The conditional ambiguity in that case is 

I^X 1 \X 2 {X2) = \S X1 \X 2 (X2)\, (4) 

the number of possible X\ values when X 2 — x 2 . The maximum conditional ambiguity of X\ is 

V-x x \x 2 = max{/x Xl | A - 2 (x 2 ) : x 2 € Sx 2 }, (5) 

the maximum number of X\ values possible with any value that X 2 can take. We denote the corresponding maximum conditional 
ambiguity set as Sx 1 \x 2 - 

The quantities Sx 2 \x ± (xi), (J>x 2 \Xi ( x i), Sx 2 |Xi, an d fix 2 \Xi are similarly defined by exchanging the roles of Xl and X 2 
in the preceding discussion. 

Define a functional called information ambiguity as Ix 1 .x 2 = ["^SMXiXal f° r a se * °f two random variables Xi and X 2 . 
Next, we prove certain properties of functional lx 1 .x 2 - 

Lemma 1: Ix 1 \x 2 {X 2 = x 2 ) < I Xl for all x 2 £ Sx 2 , that is, conditioning reduces information ambiguity. 

Proof: From the definitions of Sx t and Sx 1 \x 2 {x2), it is obvious that Sx 1 ix 2 i x i) C Sjfi- This implies that /Uxi|x 2 (#2) < 

^ ■ 
Also, it follows from Lemma 1 and <|5j that I Xl \x 2 < ^Xi- 

Lemma 2: (Subadditivity) IfA 1 and X 2 are "interacting", that is Sx 1 ,x 2 C Sx! X , then Xx 1 ,x 2 < Xx 1 + Ix 2 ■ 
Proof: We know that S Xl ,x 2 Q S Xl x S X2 ■ So, 

fJ>X u X a < A*Xi X /ix 2 

log/x Xl ,x 2 < log(MXi x Mx 2 ) 
= log Mx 1 + log Mx 2 
r io SMXi,x 2 l < [log MX! +log^x 2 l 

< [logMXil + \logn Xa ], 

thus, proving the lemma. ■ 
Lemma 3: (Additivity) If X x and X 2 are "non-interacting", that is Sxi,x 2 = S Xt x Sx 2 then X Xl ,x 2 = ^Xi +^x 2 > where 
= denotes equality within one bit per random variable. 
Proof: We know that S Xl ,x 2 = Sx t x Sx 2 ■ So, 

MXi,x 2 = MXi x Mx 2 
log^Xi.x, = log(MXi x ^x 2 ) 
= log Mx 1 + log Mx 2 
riog^ Xl ,x 2 ] = [log^Xi +log^x 2 l 

= Rog^xj + riogMx 2 i 

Thus proving the lemma. ■ 
Lemma 4: X Xl ,x 2 < ^Xi + £x 2 |Xi 

Proq/: We know that 6x1X2 C Sxi x <Sx 2 |Xr So, 

MXi,x 2 < Mxi x ^x 2 |Xi 
logMx!,x 2 < log^Xi +log^x 2 |X! 
[log/xx!,x 2 l < [logMXi +logfe 2 |x 1 l 
< [log MXil + riog^x 2 |xJ 

This proves the lemma. ■ 
Corollary 1: lx u X 2 <^x 2 +T Xl \x 2 

Proof: Reversing the roles of Xi and X 2 in the proof of Lemma |4j completes the proof. ■ 
Lemma 5: Let II denote the set of two possible permutations of {1, 2}, then 

%x u x 2 < ™(Ix, w +^X» (2) |X B(1) ) 
Proof: The proof follows from combining Lemma [4] and Corollary [T] ■ 
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Fig. 3. The probability distribution V for (Xi, -X2). 



B. Ambiguity Computation: An Example 

We illustrate some of the definitions and properties of the notion of information ambiguity we have discussed in this section, 
using the probability distribution V for two random variables X\,X 2 , given in Figure [3] 
Using ([TJ, support set Sx x ,X 2 lS: 

Sx u x a = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 1), (3, 2), (4, 3)} 

with corresponding ambiguity fixi,x 2 = ISxl.xJ = 8. Therefore, the number of bits required to describe the elements of 

Sx u x 2 are Zx t ,x 2 = r io SMXi,x 2 l = 3 bits. 

Further, using |2]) the support sets of X\ and Xi are, respectively: 

S Xl = {1, 2, 3, 4, 5}, p, Xl = \S Xl | = 5 
Sx 2 = {1,2,3,4}, MX2 = |Sx 2 | = 4 

The minimum number of bits required to describe the elements of S Xl and Sx 2 are l Xl — [log/ixi] = 3 bits and Ix 2 — 
[log/ix 2 l = 2 bits, respectively. 
Computing the conditional ambiguity sets and corresponding conditional ambiguities for X\, we have using (|3): 

S Xl \x 2 (X 2 = 1) = {1,2,3,4,5}, (i Xl \x 2 (X 2 = 1) = 5 
S Xl \x 2 (X 2 = 2) = {\}^ Xl \x 2 {X 2 = 2) = 1 
S Xl \x 2 (X 2 = 3) = {2}^ Xl \x 2 (X 2 = 3) = 1 
S^i^^a = 4) = {3},fi Xl \x 2 (X 2 = 4) = 1 

Therefore, the maximum conditional ambiguity in X\ given X 2 is /*Xi|x 2 = 5- 
Similarly, for X 2 , we have: 

Sx 2 \ Xl (Xi = 1) = {M},^,^, = 1) = 2 
5x 2 |^(^i = 2) = {M},^*^ = 2) = 2 
5 X2 | Xl (X! = 3) = {1,4},»x 2 \xAXi = 3) = 2 

S^i^^i = 4) = {l},/i X2 | Xl (Xi = 4) = 1 

Sx 2 \Xi(Xi = 5) = {l},/xx 2 |Xi(^i = 5) = 1 

Hence, the maximum conditional ambiguity in X 2 given X\ is ^x 2 |Xi = 2. 
Above, we computed lx 1 ,x 2 = 3 bits. Now, let us compute 

l Xl +i X2 \x 1 = 3 bits + 1 bits = 4 bits 
Ix 2 +T Xl \x 2 = 2 bits + 3 bits = 5 bits 



This illustrates Lemma [4] Corollary [T] and Lemma [3] 

C. Ambiguity: N Random Variables 

The definitions and results of subsection III-A for the information ambiguity of two random variables are easily extended 
to their multiple random variable counterparts for the information ambiguity of a set of N random variables U = {Xi, . . ., 
Ay), 

Consider a discrete and finite probability distribution V for N random variables X^, i G Af, Xi € X, where Af = {1, . . . , N} 
and X is the discrete and finite alphabet of size \X\. Consider a X-tuple of random variables (Xi,...,Xjy) ~ V = 
p(x\, . . . , xn). The support set of (Xi, . . . , Xjv) is defined as: 

def 

Sx u ...,x N = {{x 1 ,...,x N )\p{xi,...,x N ) > 0} (6) 

We also call Sx 1 ,..,,x N the ambiguity set of (Xi, . . . , Xn). The cardinality of Sx lt ,,,,x N is called ambiguity of (Xi, . . . , Xjv) 
and denoted as t l x 1 ,....x N — \Sx!,....x N \- Therefore the minimum number of bits required to describe an element in Sx 1 ....,x N 
is \logfi Xl ,...,x N ]- 

Consider sets of random variables Xa and X C A such that X C A = U \ Xa- Denote instances of Xa and X C A as xa and x A , 
respectively. The support-set of Xa is defined as: 

def 

Sx A = {xa\ for some i^^a^a) € S^!,...,^} 

We also call Sx A as the ambiguity set of X^, with corresponding ambiguity denoted as p,x A and defined as p, XA = 
So, the minimum number of bits required to describe any value of Xa is [log/ix A l- 

Consider random variable X, E Xa and the set of random variables Xb C Xa,X{ £ Xb- Denote an instance of Xb as 
Xb- The conditional ambiguity set of Xi, when set Xb takes value xb G <Sx B is 

^XilXfil^s) = :(xi,x B )e S Xa }, (7) 
the set of possible Xi values when Xb = xb- The conditional ambiguity in that case is 

VXi\x B {xB) = \S Xi \x B (x B )\, (8) 
the number of possible X{ values with Xb = Xb- The maximum conditional ambiguity of Xi is 

fix,\x B = sup{^i Xi \ Xb (x b ) : x B G S Xb }, (9) 

the maximum number of Xi values possible over any x B . 

In fact, for any two subsets Xa and Xb of {Xi, . . . , such that Xa U Xb C {Xi, . . . , X^} and Xa H Xb = 

we can define for example, ambiguity set Sx A °f ^^4' conditional ambiguity set Sx a \x b (xb) of Xa given the set of 
values that Xb can take, and maximum conditional ambiguity set Sx A \x B of Xa for any set of values that Xb can take, 
with corresponding ambiguity, conditional ambiguity, and maximum conditional ambiguity denoted by fix A , H-x a \x b (xb), 
and 'fix A \x B ^ respectively. However, for the sake of brevity, we do not introduce these definitions here as those can be easily 
developed along the lines of the definitions in |7])-(|9]). 

Further, let us represent each of p, Xi values that random variable Xi can assume in [log/iXj] bits as b\ . . •&n ogA(x ]• Let 
binary^iEi) represent the value of the j th bit-location in the bit-representation of Xi, 1 < j < [log/xxil- Then, knowing that 
the value of j th bit-location is b, we can define the set of possible values that Xj can take as 

def 

S x i \b i j ( b ) = \ x i '■ x i G S Xl and binary.^) = b}, (10) 
with corresponding cardinality denoted as lJ-x i \b i . We can similarly define S , j 5( - A | f) i (6) with Xi £ Xa as 

def 

s x A \b)( b ) = i x A ■ x A G S Xa and binary i (.x i ) = b}, (11) 

with corresponding cardinality denoted as HxMiib). The definitions in (jTOf and ( fTT| can be easily extended further to the 
situations where the values of one or more bit-locations in one or more random variable's bit-representation are known. 

Define a functional called information ambiguity as 1x 1 ....,x N = [logMXi Xwl f° r a set of N random variables Xi, . . . , Xjy- 
Next, we prove certain properties of functional Ix u ...,x N - 

Lemma 6: (Expansibility) If a component (xi, . . . ,xn) with p(x\, . . . ,xn) = is added to joint distribution V, then 
information ambiguity lx 1 ,....x N does not change. 

Proof: The proof follows from the definition of fix 1 ,...,x N - ■ 

Lemma 7: (Monotonicity) If A and B are two discrete and finite sets with A C B, then I a < Ib- 



Proof: If A C B, then with /i A = \A\ and /i^ = |£>| 



Ma < Mb 
log ma < log Mb 
[logM^l < [logMsl 



Thus proving the lemma. 

Lemma 8: (Symmetry) Ix 1 x N = ^-k(Xi,...,Xm) f° r permutations ir(Xi, . . . , Xn). 

Proof: The proof follows from the observation that any rearrangement of the elements of universal set U does not change 
the cardinality of support-set Sx ± ,...,x N - ■ 

Lemma 9: (Subadditivity) If X i} 1 < i < N, are "interacting", that is Sx 1 ,...,x N C Sxx X ... X Sx N , then Ixi,...,x N < 

2~2z=l I X z - 

Proof: The proof follows from the straightforward extension of the proof of Lemma [2] for multiple random variables. ■ 
Lemma 10: (Additivity) If JQ, 1 < i < N, are "non-interacting", that is Sjci,... Z« = X . . . X Sx w > then Ix lt ...,X N = 

Proof: Follows from extending the proof of Lemma [3] to multiple random variables. ■ 
The above lemmas establish that functional 1x x ,....x N — riogMXi,....X]vl is a valid information measure as it satisfies 
various axioms of expansibility, monotonicity, symmetry, subadditivity, and additivity of valid information measures J3J. 

Remark: An astute reader may note that in spite of the apparent similarities between the information measures information am- 
biguity proposed above and well-known Hartley measure, these two measures are fundamentally different. For a set of N random 
variables these two information measures define their unconditional versions identically in terms of functional log fix 1 ,....x N ■ 
However, these two measures differ in their definitions of the corresponding conditional versions. While conditional Hartley 
measure characterizes average nonspecificity, conditional information ambiguity characterizes maximum nonspecificity. For 
example, for a set of two random variables X and Y, in conditional Hartley measure H (fixity) = log f^in^n ) ratio 
tlXl / j,Y X " re P resents me average number of elements of jix possible under the condition that an element from /iy has been 
chosen [ 3 , Chapter 2], while maximum conditional ambiguity Mx|Y in conditional information ambiguity Ix\y represents the 
maximum number of possible elements of jix under the condition that an element from /iy has been chosen. 
Lemma 11: Let II denote the set of all possible permutations of {!,..., N} and tt £ H, then 



JV 

I Xl ,...,Xs < ™ r }Z)^W)l^(i).-.^(i- 
i=l 

Proof: Combining the proofs of Lemma |4] and Corollary [T] generalized for N random variables, proves the lemma. These 
generalizations themselves are easily obtainable from their two variables counterparts, as in the proof of Lemma [9] ■ 
Lemma 12: S x .\ Xa (x a ) = ^ Xj ex A S x t \x 3 {xj) 

Proof: We prove the lemma by individually proving both directions of inclusion. 

• S x% \x a {xa) C OxjEXa Sx^x^Xj): Consider s G S Xi \x A {x A ). We need to prove that s G S Xi \Xj (xj), Xj G X A . By 
definition, s is one of the values that the random variable Xi can take when Xj = Xj,VXj G X A . This implies that 

s G S Xi \x j {xj),X j G X A . 

• S x .\ Xa {xa) => C\x 3 ex A s x t \x 3 {xj)- Consider s e flx.ex^ Sx^Xji^j)- Ti ^ s implies that s G S Xl \x 1 {xj)^X J G X A . 
Now, let us suppose that s Sx { \x A (x A ). However, this leads to a contradiction as Sx^x^xa) is defined to be the set 
of all those values that can take, when Xj = Xj,VXj G X A . 

Combining these two proofs proves the lemma. ■ 
Lemma 13: fi Xl \x A (x A ) < mm Xj ex A VXi\Xj(xj) 

Proof: First consider the intersection of finite number of finite sets Ai,i £ I, where I is some index set. 

Vx z \x A (x A ) = \S Xi \x a {xa)\ 



(6) 



(c) 



n s Xi \ Xj {xj)\ 

Xjex A 



< mm \S Xl \ Xj (x 3 )\ 
min nxAXjixj), 

Xj G X A 



where (a) follow from the definition fJ-Xi\X A ( x A)< (b) follows from the Lemma 12 and (c) follows from IHiei^l — 
miriig/ \Ai\. This proves the lemma. ■ 




Fig. 4. "Two informants - Single sink" communication problem. 



Lemma 14: fix,\x A < rmn Xj ex A V'X i \x :i 

Proof: From the definition of jixi\x A , let %*a be an instance of Xa that maximizes lixAX A {xa)- Similarly, by the definition 
of Jlx^Xj-iXj € Xa, let Xj be an instance of Xj that maximizes fix^Xj ( x j)i Xj € -^Ca- Therefore, 

Mx 4 |x^ = Hx t \x A {x*a) 

(a) 

^ x m ix A ^ x ^ ] 
=xfixj ix ^ 



where (a) follows from Lemma 13 thus completing the proof. 

IV. Notation 

This section provides the notation used frequently in rest of the paper. 



J\f: the set of N informants. 

X: finite, discrete alphabet set of size \X\. 

V: iV-dimensional discrete probability distribution, V =p(xi, . . . ,xjf),Xi E X. 
Xf. random variable observed by the informant i. Xi G X. 

Sxi- the ambiguity set of the i th informant's data, with corresponding ambiguity Hx t — \SxX 
S Xi \i' the conditional ambiguity set of the sink in the i th informant's data when the sink has information /, which can be the 

set of values of one or more bit-locations in the representation of one or more informants' data. However, the exact 

nature of / will be obvious from the context. 
\xXi\i'- the conditional ambiguity, \Sxai\- 

J2xi\i : the maximum conditional ambiguity, computed over all instances of /. 

...,x N - the ambiguity set at the sink of all informants' data, with jttx 1 ,...,x JV = |5xi,...,Xjv | as the corresponding ambiguity. 
.x N |/ : the conditional ambiguity set at the sink of all informants' data, with fix 1 ,....x N \i — \Sx 1 ,...,x N \i\ as the corresponding 
conditional ambiguity. 

S x .: the fc th -extension of ambiguity set S Xi) i £ TV, with corresponding ambiguity {i x . = \S X .\. 
S^if. the conditional fc^-extension of ambiguity set Sxt when the sink has information /, with corresponding conditional 
ambiguity ^ Xi \i = \ S x z \il 

x : the fc th -extension of ambiguity set S Xl ,...,x N , with n k x x — \S X x \ as the corresponding ambiguity. 
Xn \j'- the conditional fc th -extension of ambiguity set Sx x x N when the sink has information /, with corresponding conditional 

ambiguity fJ> Xl ,...,x N \l = \ S Xl ,...,X N \l[ 
Cb- the worst-case bit-compressibility of distributed compression problem with single instance of source data- vector. 
C B : the worst-case bit-compressibility of distributed compression problem with k, k > 1, instances of source data-vectors. 

V. Problem Setting 

Consider a distributed information-gathering scenario, where a sink collects the data from N informants sampling correlated 
data. Divide the sequence of events in this data-gathering problem in terms of data-generation epoch and data-gathering epoch. 
In the data-generation epoch, a sample x = (x\, . . . , xn),x € Sx lt ... Xn> ^ s drawn from the discrete and finite support-set 
Sxi,...,x N over N binary strings, as in JHJ, 1 1 1 1. The strings of x are revealed to the informants, with the string xi being given 



to the i th informant, i 6 JV. Then in the data-gathering epoch, the sink wants to losslessly (Error probability P e = 0) learn x 
revealed to the informants. Each data-generation epoch is followed by a data-gathering epoch and vice-versa. 



Problem Statement: A sample x = (xx, . . . , xn) is drawn i.i.d. from the distribution V over N binary strings. The strings 
of x are revealed to the informants, with the string X4 being given to the i th informant. The sink wants to learn each informant's 
string losslessly (P e = 0). An informant may not learn about other informants' or the sink's data. Our primary objective is to 
minimize the total number of informant bits required, in the worst-case, to accomplish this, but we are also concerned with 
minimizing both, the number of rounds and the number of sink bits. This is illustrated in Figure [4] for the scenarios with two 
informants and one sink. 

The Problem Setting: We consider an asymmetric communication scenario^] p7| . Communication takes place over N binary, 
error-free channels, where each channel connects an informant with the sink. An informant and the sink can interactively 
communicate over the channel between them by exchanging messages (finite sequences of bits determined by an agreed upon, 
deterministic protocol.) The informants cannot communicate directly with each other. We assume that in the data-gathering 
epoch, communication between the sink and the informants proceeds in rounds, as in [28 1. In each round, depending on the 
information held by the communicators, one or other communicator may send the first message. However, as argued in (25], 
if we allow the empty messages and eliminate the last message if it is sent by the sink, then any sequence of messages 
can be converted into another sequence where the same communicator transmits the first message, with no increase in the 
worst-case communication complexity. Therefore, we assume that in each communication round, first the sink communicates 
to the informants and then, the informants respond with their messages. Each bit communicated over any channel is counted 
as either a sink bit or an informant bit. 

We assume the informants to be memoryless in the sense that they do not remember their messages sent in the previous 
rounds. However, we assume that the i th informant knows its support-set Sx t , so that it represents the binary string Xi given 
to it in X x . = [log fx Xi ] bits as b\ . . . &} logMx j ■ 

The sink knows distribution V and the corresponding support set Sx x ,...,x N - So, every x, x £ Sx lt ...,x N , can be uniquely 

described at the sink using lx 1 ....,x N = \\og fix 1 ,....x N ~\ bits. This implies that, in the worst-case, Xx x x N informant bits 

are necessary for the sink to learn x unambiguously. However, these many informant bits may not be achievable, in general, 
for any communication protocol as the sink needs to query the informants based on some function of independent encoding 
of their data-strings that the informants can construct rather than some arbitrary encoding of x that the sink can construct. 
However, as long as the sink can be assumed to know joint distribution V, there is at least one coding scheme that both, 
the sources and sink can construct without any explicit communication between them and still achieve optimal distributed 
compression performance. Next, we propose one such encoding scheme that the sink can construct to query the informants and 
informants can use to respond to the sink's queries. This scheme allows us to not only compute minimum achievable number 
of informant bits required for data-gathering at the sink but also provides an efficient way to achieve those. 

New Problem Encoding Scheme: As each informant i, i e JV, knows its support-set Sxi, it can describe each Xi, Xi <E Sx f , 
as set B x i of Xx, bit^j Therefore, every x can also be uniquely described at the sink as set of Y^izN^Xi bits, constructed 
by concatenating Ix t bits long representation of each Xi,i E Af. This implies that X^ga/" informant bits are always 
sufficient for the sink to learn x unambiguously. The following example illustrates this encoding scheme. 

Example 1: Consider an example support-set shown in Figure [5] Let informants 1 and 2 observe the random variables Xx 
and X2, respectively. For the given support-set, at least Xx 1 ,x 2 = 4 bits are required to describe any element of Sx lt x 2 an d 
it requires no less than 3 bits to independently describe a value that Xi or X 2 take. 

For a given support-set, the sink can construct a figure similar to Figure [5] One of the strings from the fourth column is 
drawn, with first Xx 1 bits given to informant 1, next Xx 2 bits given to informant 2, and so on. Then the data-gathering problem 
is that the sink wants to leam of this string, whose different parts are held by different informants, with the informants sending 
minimum total number of bits to the sink. ■ 

Given the above encoding scheme, the worst-case bit-compressibility Cb of distributed compression problem is defined as: 

C B = _ max min \bs\ such that Sx 1 ,...,x N \b w = {%}, (12) 

where S Xl ,...,x N \b denotes the conditional ambiguity-set of (Xx, ■ . ■ , Xn) when the bits corresponding to subset bx, bx Q Bx, 
are known at the sinlQ In other words, there is at least one x G Sxi,....x N such that no fewer than Cb informant bits are 
sufficient to describe it at the sink unambiguously. Note that Xx 1 .... t x N < Cb < X^eiv-^XV 

Definition (Worst-case Bit-Compressibility): The distributed compression problem is called worst-case bit-compressible if 
Cb < J2ieN-^Xi, otherwise bit-incompressible. 

Note on the terminology: We call a bit undefined if the sink does not know its value, otherwise it is called defined. For 
example, until the sink learns of actual x revealed to the informants, one or more bits in the Y^,i=x X Xi bits long representation 

2 Our formalism can also be applied to the communication scenarios, where even the sink does not know V, but can estimate it as it collects the data from 
the informants, drawn from V. For example, in J5J, a linear predictive model is used to estimate the correlation structure. It should be noted that we assume 
nothing about this distribution, except that it is a discrete distribution with finite alphabet. 

3 This encoding scheme can be agreed upon a priori between the sink and and each informant. 

4 Note that b^ can be written as concatenation of b x i C B x i , i 6 J\T. 
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(3.3) 
(3.5) 
(4,2) 
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Binary representation 

( 000, 000 ) 
( 000, 010 ) 



(001,001 ) 
(001,011 ) 
( 010, 000 ) 
(010,010) 
(010, 100) 

(011,001 ) 
(011,011 ) 
( 100,010) 

(c) 



Concatenated Binary 
Representation 

000000 
000010 



001001 
001011 
010000 
010010 
010100 
011001 
011011 
100010 
(d) 



Fig. 5. Example of problem setting: (a) Support-sets: Sx 1 ,x 2 > &x 1 , Sx 2 w ^ th t L x 1 ,x 2 = lO.MXi = /-ix 2 = 5 (b) the members of Sx 1; x 2 ( c ) binary 
representation of members of Sx lt x 2 (d) the concatenated binary representation. If the string '000010' is drawn, then '000' is given to informant 1 and 
'010' is given to informant 2. 



of x, remain undefined. Similarly, informant i, i e M, is called undefined if sink does not know corresponding 
exactly, otherwise the informant is called defined. 

VI. Bit-compressibility for Distributed Compression 

In this section, we address the problem of worst-case distributed compression in two different communication scenarios. In 
the first scenario that we call bit serial communication scenario, in each communication round in a data-gathering epoch, only 
one informant can send one bit of information to the sink. This allows us to compute the minimum number of informant bits 
(total and individual) required to enable the sink to learn the particular x, x e Sx 1 ,...,x N > revealed to the informants in the 
data-generation epoch, when any number of rounds and sink bits can be used. In other words, this communication scenario 
allows us to compute the largest worst-case achievable rate-region for this problem, as we show later. In the second scenario 
that we call round parallel communication scenario, one or more informants can send one or more bits in parallel to the sink. 
This as we argue and show later, allows us to exploit various trade-offs among the number of informant bits, the number of 
sink bits, and the number of rounds. 

Definition (Achievable Rate-Region): The achievable rate-region 1Z for the worst-case distributed source coding problem 
with N informants is defined as the set of all iV-tuples (b x i, . . . ,b x N) of informant rates (in bits) such that when i th ,i e Af, 
informant sends the subset b x i,b x i C B x i, of bits in the particular rate-tuple, then the sink is able to retrieve x unambiguously, 
that is: 

K = {{b x i,. . . ,b x N)\S Xl ,...,x N \(b xl ,...b xN ) = {x} and b x , C B x i,i e TV} (13) 
A. Bit Serial Communication 

We discuss the optimal solution of the distributed compression problem introduced in the previous section. We first provide 
an interactive communication protocol, called "Bit-Serial" protocol, and then prove that it optimally solves the problem. 
Further, we show that "Bit-Serial" protocol also allows us to compute the maximum achievable rate-region of the distributed 
compression problem we are concerned with. Next, we describe "Bit-Serial" protocol in detail. 

Bit-Serial Protocol: Consider an interactive communication protocol where in each round only one bit is sent by the informant 
chosen to communicate with the sink. The chosen bit has the property that it divides the current conditional ambiguity set at 
the sink maximally close to half among all candidate bits. This offers the opportunity to optimally minimize the number of 
informant bits, as it maximally conditions the ambiguity sets of the informants at the sink. 

Consider the I th communication round. At the beginning of the I th round, let U and D denote, respectively, the sets of 
undefined and defined bit-locations among J2^ =1 Ix i bits long representation of x at the sink, \U\ + \D\ = ^^Ixc The 
ambiguity at the sink in all informants' data is fr x x = Hx 1 ....,x n \d- Let Nf,N^ respectively denote the number of Qs 
and Is at the bit location i e U, over all n l x Xn strings. Then the chosen bit is the one that solves argminj e[/ \Nf — N*\. 
The sink, after receiving the value of the chosen bit, recomputes the set of undefined bits U. This is carried out iteratively till 



all bits in 53»=i -^X". bits long representation of X are not defined. This is formally summarized in "Bit-Serial" protocol given 
below. 



Protocol: Bit-Serial 



1 1 = 

2 Let S l Xl! Xn = S Xl ,....x N 

3 Let V = {1, . . . , Yj%=i 3-Xi}'- index set of all bit-locations in 

4 Let U be the index set of undefined bits in V, U C V 

5 while > 1) 

6 J'+^argmin^liVO-iV/l 

7 If | J /+1 | > 1, then choose uniformly at random the bit-location 

8 The sink asks the informant corresponding to bit-location to send bit-value b(j l+1 ) 

y aei ~ D x 1 ,...,x JV |b(i'+i) 

10 Compute U C V, the set of undefined bits 

11 1 = 1 + 1 

The sink can perform the worst-case performance analysis of Bit-Serial protocol by selecting on the Line [8] b*(j l+1 ) that 
solves: 

b*(j l+1 ) = argmax/4 ^ |fc( . 1+1)=s 

s={0,l} ' W 

The binary representations of elements of Sx lt ...,XN m terms of -B^, as in Figure J5jd), can be arranged as the leaves of a 
binary tree. For each of YliLi Ixi bit-locations in the B% representation of x, there is a binary tree rooted at that location with 
all other locations forming the internal nodes of the tree. At any node in the tree, the bit- value '0' leads to the left subtree and 
'1' leads to the right subtree. Such a binary tree with Hx 1 ,...,x N leaves will have a minimum-height of Ixi,...,x N > implying 
that at least Xx 1 ,...,x N bits are required to describe any leaf, in the worst-case. Figure [6] provides the canonical representation 
of one of the possible binary trees for the distributed compression problem in Figure [5] 

We show that the problem of minimizing the total number of bits Cb that the informants must send to the sink to help 
it learn any x £ Sxi,...,Xn is equivalent to the problem of constructing minimum-height binary tree for concatenated bit- 
representations of the elements of Sx 1: ...,x N - We prove that Bit-Serial protocol constructs such trees for a given support set 
and so optimally solves the worst-case asymmetric distributed compression problem. 

Lemma 15: Bit-Serial protocol computes all minimum-height binary trees corresponding to the given support-set. 

Proof: In the canonical representation, as in Figure [6| of a minimum-height binary tree corresponding to the given support- 
set, every node corresponds to the bit-location that divides the resultant conditional ambiguity set as close to half as possible. 
However, Bit-Serial protocol precisely chooses the same bit-location in the round corresponding to the level of node concerned, 
thus proving the lemma. ■ 

Denote the set of all minimum-height binary trees as T. Let b\ denote the number of bits that the i th informant, i E Af, 
sends in the worst-case under the j th minimum-height binary tree, j € T . Then, we have the following lemma. 

Lemma 16: Bit-Serial protocol computes hi, the minimum number of bits that the i th informant needs to send to the sink 
to get defined. 

Proof: Bit-Serial protocol exploits the bit serial communication scenario where the informant chosen to communicate in 
a round can send only one bit of information to maximally condition the resultant ambiguity set at the sink. Also, to reduce 
the number of bits that an informant sends, Bit-Serial protocol can postpone retrieving the bits from the informant concerned 
until it can be postponed no more, thus maximally conditioning the ambiguity set at the sink of the informant concerned. 
These two arguments together prove the lemma. ■ 
Combining previous two lemmas, allows us to define 6j as: 

& i = min&f, (14) 

Lemma 17: For a given support-set, each corner point of the worst-case achievable rate-region corresponds to at least one 
minimum-height binary tree, with height Cb- 

Proof: For the sake of contradiction, assume that there is a corner point of the worst-case achievable rate-region to which 
no minimum-height binary tree corresponds to. This means that this corner point is outside the worst-case rate-region defined 
by the set of all the corner points visited by the set of minimum-height binary trees, T. This further implies that at this 
corner point at least one informant, say the i th , sends fewer bits than bi with 6,; as defined above. However, this contradicts the 
definition of 6, that it is the minimum number of bits an informant needs to send to the sink before it is defined. Thus, there 
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Fig. 6. Example 1: (a) Support set Sx 1 ,x 2 > w ' m A t -f 1 ,x 2 = 10,Ijf 1 ,x 2 = 4 (b) one of the minimum-height binary trees generated by Bit-Serial protocol 
corresponding to Sx-i Xa- The number appearing on the left of every node corresponds to the bit-location in the concatenated binary representation, as in 
Figure|3Jd), of the elements of Sx 1 ,x 2 ( c ) worst-case achievable rate-region. The string '000010', drawn as in Figure B] is highlighted. 



cannot be any corner point outside the rate-region defined by the set of corner points defined by the minimum-height binary 
trees in T, hence proving the lemma. ■ 
Lemma 18: Protocol Bit-Serial is worst-case optimal. 



Proof: Combining the statements of Lemmas 15 and 17 we can state that Bit-Serial protocol computes at least one 



minimum-height binary tree corresponding to each corner point of the worst-case achievable rate-region. Therefore, Bit-Serial 
protocol computes each corner point of the achievable rate-region. Thus, Bit-Serial protocol computes the worst-case achievable 
rate-region, hence it is worst-case optimal. ■ 
For two informants, the worst-case achievable rate-region in asymmetric distributed compression problem is given by the 



following corollary to Lemma 18 



Corollary 2: For N = 2, if 6j denotes the minimum number of bits that an informant i, 1 < i < 2, sends over all solutions 
of Bit-Serial protocol and Cb denotes the minimum total number of bits sent by all informants, then the achievable rate region 
is given by: 

Ri > h 
R 2 > b 2 
R\ + i?2 > Cb 

Proof: Follows from the worst-case optimality of Bit-Serial protocol proven above. ■ 
For N informants, the worst-case achievable rate-region in asymmetric distributed compression problem is given by the 
following corollary to Lemma [T~8] 

Corollary 3: The set of achievable rate-vectors for the worst-case DSC problem for the oneshot compression is given by: 
R(S) > Mjj(s) for S,S C J\f, where M^/^) is me minimum number of bits that the subset of informants S send over all 
possible solutions of ( fTZfr and R(S) — X^es 

Proof: The proof follows from establishing the worst-case optimality of Bit-Serial protocol in computing Mms)- This 
can be achieved by the straightforward generalization of the argument above to prove the optimality of Bit-Serial protocol to 
arbitrary subsets of S, S C TV. ■ 
In Figures 6]|8 for three different support sets, we give one of the many possible corresponding minimum-height trees 



computed by Bit-Serial protocol and the corresponding worst-case achievable rate regions. 

Upper bound on Cb'- In Bit-Serial protocol, as only one information bit is sent per communication round, the total number 
of rounds required is equal to Cb- Assume that in the z th , 1 < i < Cb, communication round, the size of the ambiguity set is 
reduced by 2^ 1_£i \ 1 — Tx t ,...,x N < ej < 1. So, after Cb rounds, we have ^ X c B " ,Xjv = 1. 

2 E s=i (1 ~ Ss) 

Define e = max{ei, . . . , e^}. Assume that the size of the ambiguity set in every round is reduced by 2^ 1_e \ Assume that 
the data-gathering finishes now in k rounds. It is obvious that Cb < k. Now, the size of the ambiguity set after k rounds 
satisfies ^f^rf = 1. This implies that k = [ ' OSM ( ^ £) ' Xw ] . 

Upper bound on number of sink bits: As there are N informants, under Bit-Serial protocol, in the i th communication round, 
the sink addresses the chosen informant in [log AT] bits and then in [loglog/ixj bits addresses the chosen bit corresponding 
to this informant. So, in the i th communication round, the sink sends a total of [log A] + [log log /iX;] bits, implying that to 
gather B information bits the sink sends a total of Cs[logA] + J^. [log log /Ax^l bits. 
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Fig. 7. Example 2: (a) Support set Sx 1 x 2 > with l jL X\,x 2 = l n ,JjCi,X2 = 4 00 one °f m e minimum-height binary trees generated by Bit-Serial 
corresponding to Sx lt x 2 ( c ) worst-case achievable rate-region 
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Fig. 8. Example 3: (a) Support set Sxi,x 2 , with HX\ X 2 = 9i%Xi x 2 = 4 (b) one of the minimum-height binary trees generated by Bit-Serial 
corresponding to Sxi x 2 ( c ) worst-case achievable rate-region 



B. Round Parallel Communication 

In this subsection, we investigate the worst-case performance of the interactive distributed compression problem in an 
asymmetric communication scenarios where in each communication round one or more informants may send more than one 
bit to the sink in parallel. More precisely, we use parallel communication to mean round parallel communication that we 
define as a communication scenario where in each communication round, two or more bits can be sent by one or more 
informants to the sink. Therefore, in the round parallel communication two or more informants may or may not communicate 
in parallel in the classical sense, that is, their communications may or may not overlap in time. 

The round parallel communication allows us to exploit the trade-off between the number of rounds and the number of 
informant bits. Therefore, on one extreme is Bit-Serial protocol with minimum number of informant bits and unconstrained 
number of rounds and on other extreme is a scheme where as many as 5^ i=1 ?x 4 informant bits are sent (as each informant 
i encodes its data-value in Xx i bits) in a single round. 

We provide a round parallel communication protocol that, as we prove, among all round parallel communication protocols 
minimizes the total number of informant bits in the worst-case as well as the number of sink bits and the number of rounds. 

Round-Parallel Protocol: Consider the set of concatenated bit-strings of length Y2iLi -^x>> where each bit-string corresponds 
to an element of Sxi,...,Xn* as m Figure |5jd). Consider the I th communication round. At the beginning of the I th round, let U 
and D denote, respectively, the sets of undefined and defined bit-locations among Yli=i^-Xi bits, \U\ + \D\ = Y^i=i^Xi- The 
ambiguity at the sink in all informants' data is /J Xi Xn — (J-X!,...,Xn\d- Let Nf,N^ respectively denote the number of 0s 
and Is at the bit location i G U, over all fi l x x strings. The sink computes J , the set of indices of those [log/i^- x ] 
bit locations, which divide the successive conditional ambiguity sets as close to half as possible. Set J 1 is defined as: 

J l = {j l k ,l<k< \lo gf x l x u ...,x N ]}, 

where jl is defined as: 

j{= argmin \N? - N$\ (15) 

ietf|{&*Cj1)v..,&*(j'l_i)} 



and b*(j\) G {0, 1}, t = 1, . . . , k — 1, is denned as follows: 

b*{j\) = ar g maxAt JCl) ... iJfl ,| {6 .y i)) ... i 4.yi_ i)i6(J , )= , } (16) 

s = {0,l} 

We summarize this formally as "Round-Parallel" protocol, given below. 
Protocol: Round-Parallel 



1 1 = 0. 

2 Let S l Xi Xn = S Xl ,...,x N 

3 LetV = {l,...,YZiIx i } 

4 Let U be the set of undefined bits in V, U C V, over all a; € Xjv 

5 while 0*^! jc w > !) 

6 J i+1 = 

7 for(/c = l,...,riog^ i Xn 1) 

8 = argmin |JVP — AT/|, where b* t = l,..., 

ieu\{b*(j'+ 1 ),...,b*(j' k t\)} 
is defined as in (|T6j 

9 Compute {/ C V, the set of undefined bits 

10 The sink asks the informants corresponding to the bit-locations in J l+1 
to send the bit-values k = 1, ... , [log/x^ Xjv ] 

[log fi l x x ] 

11 Compute SW. A = fr" X^,^ 

12 Compute U C V, the set of undefined bits 

13 Z = i + 1 



is 



The worst-case behavior of Round -Parallel protocol can be analyzed by assuming the set of informant bits in Line 10 
same as the set of their worst-case values, that is, = 

Upper bound on number of rounds: Suppose that the data-gathering finishes in k communication rounds and in every round, 
the informants send lx 1: ....x N bits. Assume that in i tb , 1 < i < k, communication round, the size of the ambiguity set is 
reduced by 2 ( - 1 - €i ^ 1 °s ^x, -■'■•*« 1 e% < 1. So, after k rounds, we have ^ - = i. 

Define e = max{ei, . . . , Assume that the size of the ambiguity set in every round is reduced by 2( 1 ~ e T lo SA i x 1 ,...,x Jv l 
Assume that the data-gathering finishes now in k* rounds. It is obvious that k < k*. Now, the size of the ambiguity set after 



k* rounds satisfies n ^n^V"'"*" — i — = 1. So, 

(2 X 1> - > X N Mfe 



k* 



logAOd,...^ 
(1 - e)T Xl ,...,] 



< 



(1 



This implies that k* = 1, if e < 0; k* < 2, if < e, e sa 0; and k* < [tj^j] , if e w 1. 

Upper bound on number of informant bits: The total number of informant bits sent in a round is upper-bounded by l Xl ,x N > 
so the total number of informant bits sent over all rounds 2~2i=i \^°S Mjfi X 1 ^ s u PP er_ bounded by k*X Xl> ...,x N - 

Upper bound on number of sink bits: The sink can address each of N informants in [log N~\ bits. So, it addresses all 
informants in N [log N~\ bits. In N more bits, it informs all informants whether those have to transmit anything in the current 
communication round or not. The sink asks the informant i in [loglog/XXj] bits, to send the bit-value corresponding to the 
bit-index [log log p> Xi ] ■ So, the total number of bits that the sink sends over all rounds, is upper-bounded by fc*(7V[logA^] + 
N + Ylie J l°SMx 4 l )■ In the case, when all the informants encode their information in [logn] bits each, then the total 
number of sink bits is upper bounded by k*(N\\og N~\ + N +Ixi,...,x N [log logn]). 

Lemma 19: The total number of informant bits under Round-Parallel protocol upper-bound the total number of informant 
bits under Bit-Serial protocol. 

Proof: In Bit-Serial protocol, the optimal bit-location (in the sense of dividing the resultant ambiguity set as close to half 
as possible) to be polled in a round is determined by actual values of the previously polled optimal bit-locations. However, 
in Round-Parallel protocol, in the ^ th round, / > 1, out of \\ogp Xi Xn \ bit-locations to be polled, all except the first 
bit-location to be polled are selected by assuming that the previously chosen bit-location assume their worst-case bit-values, 
as in ( fTo") . This implies that Round-Parallel protocol, while provisioning for the worst-case, over-estimates the total number 
of informant bits, compared to Bit-Serial protocol. Therefore the number of informant bits under Round-Parallel protocol 
upper-bound the total number of informant bits under Bit-Serial protocol, thus proving the lemma. ■ 



Corollary 4: The performance of Round-Parallel is same as that of Bit-Serial on those elements of the support-set on 
which latter achieves its worst-case performance, in terms of total number of informant bits. 

Proof: For those members of the support-set on which Bit-Serial protocol performs the worst, Round-Parallel protocol 
while provisioning for the worst-case, precisely chooses the values of same bit-locations to be communicated as Bit-Serial 
protocol, thus achieving identical performance. ■ 
All parallel protocols require more total number of informant bits, in the worst-case, compared to Bit-Serial. However, 
among all such round parallel protocols, Round-Parallel provides the best worst-case performance, as next lemma states. 
Lemma 20: The Round-Parallel protocol is optimal round parallel communication protocol. 

Proof: We prove the theorem by considering its following implication. No round parallel protocol can do better than 
Round-Parallel protocol in the following sense: the total number of informant bits and the number of rounds it requires for 
a given support-set are no less than as required by Round-Parallel protocol for all elements of the given support-set. 

Assume for the sake of contradiction that there is a round parallel communication protocol, let us call it Protocol X, that 
is better than Round-Parallel protocol. This implies that Protocol X achieves at least one of the following: 
Case 1 : Under Protocol X fewer informant bits are sent in fewer communication rounds compared to Round-Parallel protocol. 
Case 2: Under Protocol X fewer informant bits are sent in same number of communication rounds compared to Round-Parallel 
protocol. 

Case 3: Under Protocol X same number of informant bits are sent in fewer communication rounds compared to Round-Parallel 
protocol. 

Now, we prove that each of these three cases leads to a contradiction. 

Case 1: If Protocol X sends fewer informant bits than Round-Parallel protocol for all elements of the support-set, then 
this implies that even for the elements on which Bit-Serial protocol or Round-Parallel protocol (from Corollary |4| achieves 
its worst-case performance, in terms of total number of informant bits, Protocol X can achieve better performance. However, 



this contradicts the worst-case optimality of Bit-Serial protocol (from Lemma 18 I. 

Case 2: Applying same reasoning as in Case 1 to this case leads to a similar contradiction. 

Case 3: For a given support-set if Round-Parallel protocol finishes the data-gathering in k, k > 1, rounds in the worst- 
case, then we can always construct a round parallel communication protocol Protocol X that finishes the data-gathering in 
k' , 1 < k' < k, rounds with same number of informant bits as Round-Parallel protocol in the worst-case. However, any such 
protocol while provisioning for the worst-case, ends-up sending more informant bits and requires more rounds than Round- 
Parallel protocol for those elements of the support-set on which Round-Parallel protocol does not achieve worst-case optimal 
performance. 

Each of these cases shows that there are always some elements of the support-set on which Protocol X performs worse 
than Round-Parallel protocol. Therefore, we prove that no round parallel communication protocol can do better than Round- 
Parallel protocol for all elements of the given support-set. ■ 

VII. Role of Block-coding in the Worst-case Distributed Compression 

We have, thus far, discussed the notion of worst-case compressibility in distributed source coding scenario when only a 
single instance of data-vector is available at the informants (oneshot compression). However, the majority of results in classical 
Information Theory are derived in the limit of asymptotic block-lengths, though recently the role of non-asymptotic block- 
lengths has been investigated p9)-pT). These results firmly establish the effectiveness of block-coding in achieving the optimal 
average-case performance of various information-theoretic problems. In this section, we attempt to investigate the effectiveness 
of block-coding in realizing the optimal worst-case performance of asymmetric distributed source coding problem. 

Formally, we are concerned with addressing the question whether solving the worst-case bit-compressibility problem over 
block-length k, k > 1, results in fewer informant bits and larger achievable rate-region than solving this problem k times over 
single instance of data as in ( fl~2] >. To aid in our subsequent analysis, we introduce some definitions. 

Define S Xi Xn — Sx ± ,,,.,x N ' Then, for k > 1, the fc th -extension of support-set Sx x x N is: 

s x u ...,x N = s x~t..,x N x Sx u ...,x N (17) 
The k th -extension of data-vector x, x G Sx 1 ,....x N , is: 

X = [x\, . . . , Xjv) fc = (x\ ■ ■ ■ x\, . . . , xjy • • • Xjv) = (•''Ij ■ • • ■> x n) 

Then, the fc th -extension of support-set Sx ( , i € A/", is: 

S k Xi = [x k \ for some x k _,, (x k ,xtd e S Xi _ Xn } (18) 
Note that \S Xi _ Xn \ = (i Xl ,...,x N and |5*J = 



TABLE I. Example of Problem Encoding Scheme for DSC with Block-coding where column (a) refers to Support set Sxi X 2 > C 3 ) re f ers to elements of 
S x x (2-extended support-set), (c) refers to 2-Block Data of Informants, (d) refers to Binary Representation, and (e) refers to Concatenated Representation 



(a) 


(b) 


(c) 


(d) 


(e) 






(1,1) (1,1) 


(1,1) (1,1) 


(00000) (00000) 


0000000000 






(1,1) (1,3) 


(1,1) (1,3) 


(00000) (00010) 


0000000010 


\ X 1 

X 2\ 


1 2 3 4 5 


(1,1) (2,2) 
(1,1) (2,4) 
(1,1) (3,1) 


(1,2) (1,2) 

(1.2) (1,4) 

(1.3) (1,1) 


(00001) (00001) 
(00001) (00011) 
(00010) (00000) 


0000100001 
0000100011 


1 


* * 


0001000000 


2 


* * 


(1,1) (3,3) 


(1,3) (1,3) 


(00010) (00010) 


0001000010 


3 


* * * 


(1,1) (3,5) 


(1,3) (1,5) 


(00010) (00100) 


0001000100 


4 


* * 


(1,1) (4,2) 


(1,4) (1,2) 


(00011) (00001) 


0001100001 


5 


* 


(1,1) (4,4) 


(1,4) (1,4) 


(00011) (00011) 


0001100011 






(1,1) (5,3) 


(1,5) (1,3) 


(00100) (00010) 


0010000010 



Problem Statement: Consider a distributed information-gathering scenario where a sink collects the data from N correlated 
informants. As in Section [V] divide the sequence of events in this distributed data-compression problem in terms of data- 
generation epoch and data- gathering epoch. However, in this case, the data-generation epoch is repeated k times, k > 1, before 
each data-gathering epoch where the sink learns each of k strings of each informant. Rest of the details of the problem setting 
are same as in Section [V] and we do not repeat those here. 

A sample x — . . . , xn) is drawn from the discrete and finite support-set Sx ly ...,x N over N binary strings. The strings 
of x are revealed to the informants, with string Xi being given to the i th informant. This process is repeated k times, k > 1, 
resulting in each informant accumulating k instances of its data. At the end of this data-generation epoch, the sink begins 
data-gathering to learn each of k strings of each informant. We are interested in minimizing the total number of informant 
bits required, in the worst-case, to enable the sink to losslessly (P e = 0) learn each of the k instances of x revealed to the 
informants in the previous k data-generation epochs. 

Consider an alternative problem formulation in terms of a new problem-encoding scheme that also facilitates the design and 
analysis of optimal solutions in our setting. 

Alternate Problem Statement: Assume that fc-extended data-vector x k = (x k , . . . ,x k N ),x k = (xj, . . . ,x k ), is drawn from 
discrete and finite fc-extended support-set S Xl Xn over N binary-strings. The strings of x k are revealed, unbeknownst to 
the sink, to the informants with the string x k given to the i th , i e Af, informant. 

The sinks knows that one of the strings from S Xi Xn is drawn and its different components are given to different 
informants. The objective of data-gathering is to enable the sink to learn the identity of this string by communicating with 
different informants. 

In our asymmetric communication scenario, the sink knows support-set S x X]v . We assume that each informant i,i g Af, 
knows its fc-extended support-set S x . . The sink can uniquely describe every x ,x S S Xl Xn , in \k log px 1: ...,x N ] bits. 
However, for the same reasons as in Section [V] to efficiently query the informants for the purpose of data-gathering, the sink 
can also uniquely encode every x k in terms of set B^k of ^2 ieJ ^\k\og p Xi ~\ bits by concatenating \k log p x , ] bits long 
representation of each x k ,i € Af. 

Example 2: We illustrate the proposed problem-encoding scheme for alternative problem statement above in Table|I] Consider 
the support-set S XliX2 of two jointly distributed random variables X\ and X 2 defined over alphabet X = {1,2,3,4,5}, with 
I SjCi ,x 2 1 = 10, as given in the first column. The elements of 2-extension of this support-set S Xi X2 , \ S Xl x \ = 100, are 
listed in the second column, however for the sake of brevity, we list only 10 of these elements. Recall that each element 
of 

^Xi x 2 i s me concatenation of two samples of x — {xi,x%}. In the third column, we list the corresponding 2-extended 
data-block at each informant. The sink and each informant can agree a priori on some deterministic binary-encoding of 2- 
extension of informant's data. In the fourth column, we give one such encoding and the fifth column lists the concatenation 
of binary-encoding of data-blocks at each informant. ■ 

With this encoding scheme, the worst-case bit-compressibility problem with block-length fc is to identify the smallest subset 
of bit-locations of size C| in the concatenated bit-representation of x , whose values the sink must know to unambiguously 
learn x k revealed to the informants. That is, 

C k = max mm |M such that S k x x |b = {x k } (19) 



where S Xi x N \b k ' s ^-extended conditional ambiguity set when the subset b^k, b^k C B^k, is known at the sink. In the 
next subsection, we discuss the solution of (119). 

A. Worst-case Bit-Compressibility with Block-coding 

From the previous discussion, it is easy to observe that C B satisfies: \k log fj,x 1 ... XnI — < YliLi \k l°g A^l ■ Therefore, 
for asymptotic block-lengths, fc — > oo, fc-block bit-compressibility per block satisfies: 

(«) C| 

logVx lt ...,X N < Jim -r- < }_^ lo S^X t , 

i=l 

where (a) follows from lim^oo = x and (b) follows from 

1 N N N 

-Y,\k log HxA < lo § + X 

i—l i—1 

N N 

J im t: !og MX*] < XI log 

i=l i=l 

as limfc-j.oo t = for any finite and fixed N. 

Recall from Section [v| that the worst-case bit-compressibility for oneshot compression satisfies: [log < Cb < 
Si=i Tlog MXj] ■ This implies that compared to oneshot computation, the block-coding improves the lower-bound on bit- 
compressibility by no more than one bit and the corresponding upper-bound is reduced by at most one bit per informant. This 
leads us to conclude that for the worst-case distributed compression problem in our setting, the block-coding offers almost no 
gain with respect to oneshot compression. Therefore, the oneshot compression is almost optimal with respect to solving the 
worst-case DSC problem in our setting. 

To formally prove that Cb # < 1 for all fc > 1, a communication protocol to optimally solve the problem in < fl9j > can be 



designed as a fc, fc > 1, block generalization of Bit-Serial protocol introduced in Section VI for solving the worst-case DSC 
problem for fc = 1 {oneshot compression). Replacing the various support-sets and ambiguities in Bit-Serial protocol by their 
fc-extended equivalents and using the problem-encoding scheme proposed above in this section, results in the desired protocol, 
called "fc-extended Bit-Serial" protocol, given below. 

Protocol: fc-extended Bit-Serial 

1 1 = 

2 Let S v" v — S v \' 

3 Let V = {1, . . Yli=i \& 1°S MXi] } : index set of all bit-locations in B^k 

4 Let U be the index set of undefined bits in V, U C V 

5 while (n [ S„,x N > 1) 

6 j'+^argnun^liVP-Aril 

7 If | J l+1 \ > 1, then choose uniformly at random the bit-location 

8 The sink asks the informant corresponding to bit-location to send bit-value b(j l+1 ) 

y aei d Xi _ Xjv - ^ Xl ,...,x N |f,(i !+1 ) 

10 Compute U C V, the set of undefined bits 

11 1 = 1 + 1 

The proof of optimality of fc-extended Bit-Serial protocol for the worst-case fc-block distributed compression is obtained 



by following the same reasoning that was used to prove the optimality of Bit-Serial protocol in Section VI Therefore, we 
omit it here. 



Using fc-extended Bit-Serial protocol, in Figures 9p0 we plot the worst-case achievable rate-regions for two different 



support-sets of two correlated informants with block-length fc = 1,2, and fc — > 00. These figures demonstrate the limiting 
behaviour of sum and individual information rates as the function of block-length for two particular distributions and establish 
the almost optimality of solving the worst-case distributed compression problem with only a single instance of informant 
data-vector. 

Next, we compute the achievable rate-region for the worst-case DSC problem with block-coding in terms of the achievable 
rate-region for the worst-case distributed compression problem with single instance of informant data-vector. The following 
lemma states our result. 
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Fig. 9. (a) Support-set Sxi,x 2 w i tn l J -x 1 .x 2 = 10 5 ) corresponding worst-case achievable rate-regions for data-block length k = 1,2 and k — > oo. 
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Fig. 10. (a) Support-set Sxi,x 2 with t 1 x 1 ,x 2 = 9 (W corresponding worst-case achievable rate-regions for data-block length k = 1,2 and fc — > oo. 



Lemma 21: Given the set of achievable rate-vectors for the worst-case distributed compression problem for the oneshot 
compression, as in Corollary [3] the set of achievable rate-vectors for the worst-case distributed compression problem with 
fc, fc > 1, block-coding is defined by 



R k (S) > 



[fclog(2 M « 



1)1 



for all S C TV, where R k (S) = J2 ie s R i 

Proof: Consider constraint Ri > Mr x ■ The size of the set that can be described in Mrj bits from Informant 1 lies between 
2 m r(S) -i + i and 2 M R(.s) t Therefore, the size of the fc-extension of this set lies between (2 M «(s)" 1 + \) k and 2 feM «( s >. This 
implies that the minimum number of bits per block required from Informant 1 to describe the fc-extended set is at least 

fcrio g (2 Mfi i)~ 1 +i)i 

k 

Identical argument holds for proving other constraints in the statement of the lemma for all other subsets of TV. Combining 
all the proofs together, proves the lemma. ■ 

VIII. Conclusions and Future Work 

We consider classical problem of Distributed Source Coding in Information Theory. We propose a new canonical scheme 
to construct and address different variants of this problem. 

The classical distributed source coding (DSC) problem in Information Theory finds a natural application in addressing the 
data-gathering problem in wireless sensor networks, where sensor data is often assumed to be correlated. However, existing 
approaches to address distributed source coding problem cannot be employed directly to solve the data-gathering problem in 
wireless sensor networks. In this paper, therefore, we propose a variant of distributed source coding problem that works with 
single instance of sensor data-vector to reduce the latency of data-gathering, employs interactive communication to reduce 
expenditure of communication and computation resources at the nodes, and does not require sensor nodes to have the complete 
knowledge of the entire network. Further, to perform the worst-case information-theoretic analysis of certain problems in 



wireless sensor networks, we propose the notion of information ambiguity, prove that it is a valid information measure, and 
derive its various properties. 

We provide optimum and constructive solutions of the proposed variant of the distributed source coding problem in two 
communication scenarios in terms of two respective protocols and prove that unlike the average-case performance of distributed 
source coding problems, the worst-case performance of such problems is not enhanced by employing block-coding and the 
optimal worst-case performance can be achieved just with a single instance of source data-vector. 

We have also proposed a system-architecture to implement our work in actual data-gathering wireless sensor networks to 
enhance their lifetime. However, the details of such extensions of our work are beyond the scope of this paper and are discussed 
in one of our forthcoming submissions. 

Future Work: We are currently working towards generalizing classical Information Theory in some newer directions to carry 
out its more meaningful applications to various other problems that cannot be addressed in the existing framework. In particular, 
we are working towards extending the results in the current paper to distributed compression problems where the informants 
do not communicate directly with the sink but do so via some intermediate nodes. The operations that the intermediate nodes 
can perform, depending on their computational capabilities, on informants' data determine the maximum compression that can 
be achieved. We are also attempting to generalize the notion of information ambiguity to discrete and infinite, and continuous 
support-sets as such generalizations have interesting implications in some problems in distributed inference and learning. We 
plan to continue to work on such problems. 

We have striven for a systems-level understanding of the problem of communication under very general conditions and a 
systems-level solution to the problem. Nowhere did the problem formulation nor the proposed solution rely on details and 
specifics of the agents that make up the system. If we change the nature of the probability distributions or the objective functions 
of the resulting optimization problem, the system can represent diverse models of the real-world situations. Therefore, our 
solution techniques, approaches, and heuristics are independent of the specific application domains. Our continuing goal will 
be to apply our framework to understand communication in complex systems like intra-cellular communication and neural 
information processing systems among others. 

We recognize that our framework is in its infancy. However our approach suggests a systematic and principled framework 
for generating generalized information-like measures that are fine-tuned to aid in the task of optimal design and analysis 
of systems with communicating agents. Depending on the underlying objective functions and constraints of the optimization 
problem, we expect a zoo of information-like measures will arise. As readers may have noticed, the proofs and results in this 
work uses tools from several disciplines. As we generalize our work, we expect to find intricate and deep connections between 
these disciplines and hope to pursue it concurrently in the future. 
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